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Abstract —Crowdsourcing utilizes the wisdom of crowds for 
collective classification via information (e.g., labels of an item) 
provided by labelers. Current crowdsourcing algorithms are 
mainly unsupervised methods that are unaware of the quality 
of crowdsourced data. In this paper, we propose a supervised 
collective classification algorithm that aims to identify reliable 
labelers from the training data (e.g., items with known labels). 
The reliability (i.e., weighting factor) of each labeler is deter¬ 
mined via a saddle point algorithm. The results on several 
crowdsourced data show that supervised methods can achieve 
better classification accuracy than unsupervised methods, and 
our proposed method outperforms other algorithms. 

I. Introduction 

In recent years, collective decision making based on the 
wisdom of crowds has attracted great attention in different 
fields O, particularly for social networking empowered tech¬ 
nology m-m such as the trust-based social Internet of Things 
(IoT) paradigm id-id. Collective classification leverages the 
wisdom of crowds to perform machine learning tasks by 
acquiring multiple labels from crowds to infer groundtruth 
label. For instance, websites such as Galaxy Zoo asks visitors 
to help classify the shapes of galaxies, and Stardust@home 
asks visitors to help detect interstellar dust particles in astro¬ 
nomical images. In addition, new business model based on 
crowdsourcing ja has emerged in the past few years. For 
instance, Amazon Mechanical Turk (MTurk) and CrowdFlower 
provide crowdsourcing services with cheap prices. For MTurk , 
a minimum of 0.01 US dollar is paid to a labeler/worker when 
she makes a click (i.e., generates a label) on an item. An 
illustrating figure can be found in Fig. Q] 

Despite its cheap costs for acquiring labels, one eminent 
challenge for collective classification lies in dealing with these 
massive yet potentially incorrect labels provided by labelers. 
These incorrect labels may hinder the accuracy of collec¬ 
tive classification when unsupervised collective classification 
methods (e.g., majority vote) are used for crowdsourcing. 
Unsupervised collective classification using the expectation 
maximization (EM) algorithm [[Toll is firstly proposed in ifTTIl . 


A refined EM algorithm is then proposed in llT2l . which 
is shown to outperform majority vote. In |[T3l , a minimax 
entropy regularization approach is proposed to minimize the 
Kullback- Leibler (KL) divergence between the probability 
generating function of the observed data and the true labels. 
Some data selection heuristics are proposed to identify high- 
quality labels/labelers for collective classification based on 
weighted majority votes fT4l . Ifl5l . 

By allowing a fairly small amount of items with known 
labels for collective classification, it is shown in ed, Ca¬ 
na that supervised collective classification can improve the 
classification accuracy within affordable costs. Typical super¬ 
vised classification algorithms include binary support vector 
machine [fWL multi-class support vector machine 12(1 . naive 
Bayes sampler lTl3l . fUl . J2T1 . and multi-class Adaboost |22). 

This paper provides an overview of the aforementioned 
methods and our goal is to propose a supervised collective 
classification algorithm that assigns weights to each labeler 
based on the accuracy of their labels. The weights are deter¬ 
mined by solving a saddle point algorithm and they reflect 
the reliability of labelers. In addition to crowdsourcing, the 
proposed method can be applied to other communication 
paradigms such as mobile sensing and cooperative wireless 
network by assigning more weights to reliable users. For per¬ 
formance evaluation, we compare supervised and unsupervised 
collective classification algorithms on several crowdsourced 
datasets, which include a canonical benchmark dataset and 
the exam datasets that we collected from exam answers from 
junior high and high school students in TaiwarQ. The results 
show that the proposed method outperforms others in terms 
of classification accuracy. 

II. Problem Statement and Notations 

Consider there are L labelers, N items for classification, 
and K label classes for items. Let i e {1, 2,..., L} denote 

'The exam datasets are collected by the authors and publicly available at 
the first author’s personal website https://sites.google.com/site/pinyuchenpage 
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Fig. 1. Illustration of collective classification for crowdsourcing. 


the i-th labeler, j G {1,2,..., N} denote the j- th item, and 
Xij G {0,1, 2,..., K} denote the label of item j given by 
labeler i. x^j = 0 if item j is not labeled by labeler i. For 
supervised data, each item is associated with a set of labels and 

a true label {X,, yj}fli, where Xj = [x±j : x 2 j, _, XLj] T 

and N s is the number of supervised/training items. 

For crowdsourced data, Xj might be a sparse vector, where 
sparsity is defined as the number of nonzero entries in a vector. 
The sparsity originates from the fact that a labeler might only 
label a small portion of items. For the collected multiple choice 
exam data where labels are answers provided by students, 
Xj is in general not a sparse vector. We aim to construct 
a classifier / : {0,1,..., K} L i—>► (1, 2,..., K} for collective 
classification based on supervised data. For binary classifica¬ 
tion, we will use the convention that x^ G { — 1, 0,1} and yj G 
{ — 1,1}. For multi-class classification, unless stated, we use 
the one-to-all classifier f(Xj) = a,Tgma,x ke ^ 12 fk{Xj), 
where fk{Xj ) is the binary classifier of item j such that the 
labels of class k are 1 and the labels of the other classes 
are —1. We also denote the predicted label of item j by 
yj and the indicator function by !{.}, where l{ e } = 1 if 
event e is true and = 0 otherwise. We further define 
the weight of each labeler by Wi and define the weighting 
vector w = [wi,w 2 ,... ,wl] t - For classifiers associated with 
weighting vector w, we have fk{Xj) = sign (w T Xj), where 
sign(z) = 1 if z > 0 and sign(z) < 0 otherwise. 

III. Overview of Crowdsourcing Algorithms 
A. Majority Votes (MV) 

Majority votes is the baseline (unsupervised) classifier for 
collective classification. The classifier is 

L 

f MV (Xj) = argmax fee{12i { Xij = k }- (1) 

i=1 


B. Weighted Averaging (WA) 


Weighted averaging is a heuristic approach for weight 
assignment based on the classification accuracy of each labeler 
in the training data. Let qX A be the number of correctly 
classified items out of the supervised items for labeler i , we 
define the weight to be 


w7 A = 


ryWA 


n wA ' 
Xi=i % 


C. Exponential Weighted Algorithm (EWA) 

Exponential weighted algorithm sequentially adjusts weight 
of each labeler based on the loss of the predicted label and 
the true label m. For sake of convenience, let x^ G {0,1} 
be the binary labels and £(yj,yj) = (jjj — yj) 2 be the loss 
function. Let w fy A be the weight of labeler i at stage j with 
initial value w Fo A = 1 /L. The goal of EWA is to achieve 
low regret Rn s , defined as 

N s N s 

R Ns = - . min (3) 

j=i * e{ ’ 

the difference of loss between collective classification and the 
best labeler. The predicted label for item j is 


yj = ceil 


E Lj E\ 

i= 1<* 


EWA n 


Ef=i<r A 


(4) 


where ceil (2:) is the ceiling function that accounts for the 
smallest integer that is not less than z. The weight is updated 
according to 


w EWA 
W i,j+1 

By setting y = 


w, 


EWA t 
hj 


exp (- rj£(xij , 

it is proved in | 


</,))• 


(5) 


N s 


that R 


N s < 


lnL. That is, the regret to the best labeler scales with 
0(y/Ns) and the average regret per training sample scales 


with °( 7 k)- 


D. Multi-class Adaboost (MC-Ada) 

For multi-class Adaboost m, each labeler acts as a weak 
classifier /$(•), and in the training stage it finds the best 
labeler according to a specified error function. In each round 
the weights of the supervised data are updated so that the 
algorithm can find a labeler with better classification capa¬ 
bility for the misclassified training samples. The weight of 
each classier is determined according to the error function 
and the final classier is the weighted combination of every 
labeler. The multi-class Adaboost algorithm proposed in l22l 
is summarized as follows: 

1) Initialize all weights of the training data to be otj = -X-. 

For i = 1, 2,...., L repeat Step 2 to Step 5 

2) Find the best labeler that minimizes the error 

prr . _ v ^ N s 

^ 1 - / —1 y-yATg ^ 

3) Set Wi = log GrE* + In (K - 1) 

4) Set aj <- cxj for j = 1,2,..., N s 

5) Normalize a to have unit norm 

6) The final classifier is 

fMC-Ada( Xj ) = arg ma Xj . ={1 2.EiLi W i^-{k=fi(Xj)} 


E. Conventional Support Vector Machine (C-SVM) 

Given supervised data {Xj,yj}^L f 1? conventional SVM aims 
to solve the optimization problem lf23l 


s 3=1 


( 6 ) 


subject to yj(w T Xj —b)>l—£j, £j > 0 Vj e {1,2,..., N s }, 


( 2 ) 


















where £j > 0 is the soft margin and C is a tuning parameter. 
By representing © in dual form, it is equivalent to solving 

1 N s N s Ns 

max-- (7) 

3 = 1 ^ =1 3 = 1 

c 

subject to 0 < crj < —— Vj G {1, 2,... f 7V S }. 

^ s 

Let cr* = [cri, (72,..., &n s ] t be the solution of 0, then the 
optimal weight and intercept in © are w* = ^-i "jUjXj 
and 6* is the average value of yj — w* T Xj for all j such that 
0 < a; < Therefore the binary classifier is 

fC-SVM( X .j _ sign (yj* T Xj + b*) 

= sign ^ a* z y z XjXj + b*^j . (8) 

E Multi-class Support Vector Machine (MC-SVM) 

In lf20i a multi-class support vector machine approach 
is proposed by imposing a generalized hinge loss function 
as a convex surrogate loss function of the training error. 
Let M = [Mi, M 2 ,. .., Mk] t be the matrix containing all 
weighting vectors M& G R L for class k, the generalized 
hinge loss function is defined as max re { 1?2 ,...,K}{M^X, — 1 — 
Sy jiV } — M^Xj, where 5 yjiV is the Kronecker delta function 
such that S Vjir = 1 if yj = r and 5 yjjr = 0 otherwise. 

By introducing the concepts of soft margins and canonical 
form of separating hyperplanes, MC-SVM aims to solve the 
optimization problem 

min ^\\M\\l + Y^ij (9) 

3 = 1 

subject to My.Xj + S yji k ~ M E Xj > 1 - £j Vj, fc, 

where £ = [£i,£ 2 , • • •, £,n s ] T is the vector of slack variables 
accounting for margins with & > 0, \\M\\l = E fcJ Ml is 
defined as the ^ 2 -norm of the vector represented by the con¬ 
catenation of M's rows, and A is the regularization coefficient. 

Let l z be a vector of zero entries except that its zth entry 
being 1, 1 be the vector of all ones, and r — [n, r 2 ,..., tn J 
be a K-by-N s matrix. The optimization problem in © can be 
solved in dual form by 

N s N s N s 

max _ 9 x ^ ■ Fj r ^)+ A t J e vi ( lQ ) 

3 =1 z=1 3 =1 

subject to Tj < l yj , l T rj = 0 Vj. 

Consequently the classifier for MC-SVM is 

f MC-SVM( Xj ) = argmax A . e{12) j . 

(11) 

For algorithmic implementation using multi-class SVM, Xj is 
extended to a K x L-by-1 vector, where the label for item j 
provided by labeler i is represented as l x . .. For instance, the 
label 3 of a 4-class SVM is represented by [0 0 1 0] T . 


G. EM Algorithm 

For sake of convenience, let Xij G {0,1} be the binary 
labels. Following the definitions in lfl2l . let oli = P{xij = 
1| yj = 1) be the probability of correct classification of labeler 
i when yj = 1 and = P(xij = 0 \yj = 0) be the 
probability of correct classification of labeler i when yj = 0. 
Given the crowdsourced data X = {Xi, X 2 ,..., Xn}, the 
task is to estimate the parameters a = [a\ , ck 2 , ..., a;z,] T and 
/3 = [/3i, /? 2 ,..., /3l] t , where we denote the parameters by 
9 = {a,/3}. 

Assuming the items are independently sampled, the likeli¬ 
hood function of observing X given 0 is 

N N 

P(X\0) =l[P{x lj ,x 2j ,...,x Lj \0) =Y[P(X j \6). (12) 

3 3 

The maximization problem can be simplified as we apply 
EM algorithm flOl . which is an efficient iterative procedure 
to compute the maximum-likelihood solution in presence of 
missing/hidden data. Here, we regard the unknown hidden true 
label yj as the missing data. Define 


N 


N 


= a i Xii( 3 - “i) 1 xy ; h i = II - 3N 


3 =1 


3 = 1 


(13) 


= P( Vj = l\Xj,0) OC P{Xj\ Vj = 1) X PD( Vj = 1|0) 


aj v + bj( 1 - v) 


(14) 


by the Bayes rule with v = EyLi u j- The complete loglike- 
lihood can be written as In P(X, y\9) = Y^=i Vj va j + (1 — 
yj) ln(l — v)bj. The EM algorithm is summarized as follows: 
E-step: Since E[lnP(X, y\6)\ = J2f =1 Uj]nvaj + (1 — 
Uj) ln(l — v)bj , where the expectation is with respect to 
P(y\X,6), we compute v = E^li u j an d update Uj = 

CLj vbj (1 — v) ’ 

M-step: Given the updated posterior probability Uj and the 
observed data X, the parameters 0 can be estimated by 
maximizing the conditional expectation of correct specification 
probability, i.e., oli and by 


V—viV 

2^j =1 u j x ij 

sr-\N ! 

S 7 = l U 3 


E^|(! -%)(! -Xij) 


E.L(i 




(15) 


The binary classifier is built upon the converged poste¬ 
rior probability Uj, i.e., f EM (Xj) = ceil (uj — P). For 
initial condition, the conventional (unsupervised) EM algo¬ 
rithm adopts Uj = jjJ2jLi x ij- Since supervised data are 
available, we also propose to modify the initial condition to be 
Uj = jj E^=i w i VAx ij 3 where the weight w^ A is defined in 
Sec. IIII-B1 The experimental results show that the collective 
classification accuracy can be improved by setting the initial 
condition as the weighted average of the supervised data. 


H. Naive Bayes (NB) Sampler/Classifier 

Naive Bayes sampler is a generative classifier that 
assumes conditional independence among components in 
Xj. The classifier can be represented as f NB (Xj) = 








argmax fc= | 1 K}^k9k{Xj), where 7 Tk is the estimated prior 
of the training data, and gk{Xj) is the estimated probability 
mass function gk(Xj) = Y[f=i 9k\ x ij)^ an d g^\ x ij ) is the 
marginal probability mass function of the random variable 

Xij\Y = k. 

The prior is estimated using the Dirichlet prior, which 
accounts for uniformly distributed prior. This alleviates the 
problem that data samples of some classes do not appear in 
the training data. We thus have the estimators 7T^ = N ^ K 

and g { k\zi) = where <jb k = \{j : Vj = k}\ and 

4>m = 10 ; Vi = k A Xij = zi}\. 


IV. The Proposed Supervised Collective 
Classification Method 

We propose a saddle point algorithm for supervised col¬ 
lective classification, where the weight of each labeler is the 
solution of a convex optimization problem of the form 

mm T (w, {.V,} V L . {y,}^) + A R(w), (16) 

and A is the regularization parameter. The function T is 
a convex surrogate loss function associated with the train¬ 
ing error -X Ylf=i In P articular > we consider 

hinge loss function h{z) = max(0,1 — z) and therefore 
T = jjr- Ylf= i h{l/j wT Xj). The function R(w) is a con¬ 
vex regularization function on weighting vector w. In this 
paper, we consider the ^i-norm regularization functions, i.e., 
R(w) = \\w\\i = 1 |^i|- The ^i-norm regularization 

function favors the sparsity structure of the weighting vector 
w and therefore it aims to assign more weights on the experts 
(labelers with high classification accuracy) hidden in the 
crowds. 

With the hinge loss function, the formulation in (IT6l) can be 
rewritten as 

1 N * 

min — V + XR(w) (17) 

W J\J * -' 

5 3=1 

subject to yjW T Xj > 1 - £,•, > 0, j = 1,2,..., N s , 

where £,• accounts for the soft margin of the classifier ll23l . 
The Lagrangian of © is 

1 Ns 

£(w,t,a,/3) =—J2tj + \R(w) (18) 

S 3 = 1 

N s N s 

- V Otj(y j W T X j - 1 + &) - 
3=1 3=1 

where ay,/3j > 0 are the Lagrange multipliers. The dual 
optimization problem of (IT71) becomes 


max min £(w,£,a, 8). (19) 

a,/3,aj,/3j>0 w£ 


Fixing a, /?, and w, the value £ that minimizes £ will satisfy 
the following equation: 


ac 

dZj 


--a,-/3,=0. 


Note that (I2UII implies 0 < Substituting (l20b to 

<®, the Lagrangian can be simplified to 

N s 

£(w,a) = A R(w) — ^^aj(yjW T Xj — 1). (21) 

3 = 1 

Therefore the dual optimization problem becomes 

max mm£(w,a). (22) 

a, 0<aj<^ v 

The solution to ([22]) is a saddle point of £ that can be obtained 
by iteratively solving the inner and outer optimization problem 
and updating the corresponding parameters in ([22b \ f25j . 


Since our regularization function R(w) = ||rc||i is not 
differentiable when wi = 0 for some i. We use the subgradient 
method [25], 126 1 to solve the inner optimization problem. The 
subgradient g of \\w\\i at a point wo has to satisfy ||w;||i > 
ll^o||i + g T {w — wo) for all w. Consider a one-dimensional 
regularizer function \w\. Since \w\ is everywhere differentiable 
except when w = 0 , substituting wo = 0 we have the 
constraint on the subgradient at 0 that g < G [—1,1]. 
For w 7 ^ 0, g is the gradient of \w\ that g = 1 if w > 0 and 
g = — 1 if w < 0. Extending these results to R(w) = ||rc||i, we 
define the (entrywise) projection operator of a L-dimensional 
function g as Proj g {6) = [Proj g (gi), ■ ■ •, Proj g (g L )] T , 
where 


Projg(gi) 


Si, if \9i\ < 1, 

Hifc’ lf > !’ 


(23) 


and H^Hoo = maxi gi is the infinity norm of g. Therefore the 
projection operator Proj g guarantees that the function g to be 
a feasible subgradient of ||rc||i. 


Fixing a, differentiating £ with respect to w by using the 
subgradient g as the gradient at the non-differentiable points 
gives 

1 Ns 

g^j^ajyjXj. (24) 

3 = 1 

By the subgradient method the iterate of w at stage t +1 given 
otfX) is updated by 

afyjXi j , (25) 

where s w is the constant step length and we set = 0 , the 
vector of all zeros. The sign of the subgradient is determined 
so that £(u/ t+1 ),a®) < £(w^\a^). 

Similarly, for the outer optimization problem, given w^ t+1 \ 
the gradient of £ in (l2lb with respect to ay is 1— yjW^ 1 ^ Xj . 
Since 0 < otj < -T-, define the (entrywise) projection operator 
of a 7V S -dimensional function a as 

i otj, a . if 0 < | o/.j | < -jP, 

Proj a (a.j) = < || a ||^jv s j if a j > — ■ (26) 

[ 0, if aj < 0. 

The projection operator Proj a projects a onto its feasible set. 


( 1 N s 

w (t+1) = tc (t) ± s w Projg ( - 


( 20 ) 






The iterate of a at stage t + 1 given is updated by 

a (t+i) _ p ro j a + s a vec ^1 — yjW ^ t+1 ^ T , (27) 

where is the constant step length, and 

vec ^1 — yjW^ t+1 ^ T Xj^j = [1 — yiw^ t+1 ^ T Xi ,..., 1 — 

yN s ^ t+1 ^ T Xn s ] t • Since a relates to the vector of 
importance of the training samples, we set c^°) = -^-1 
as the initial point, which means that all training samples 
are assumed to be equally important in the first place. The 
algorithm keeps updating the parameters a and w until they 
both converge. In this paper we set the convergence criterion 
to be the i 2 norm (Euclidean distance) between the old and 
newly updated parameters (e.g., the t 2 norm is less than 
0.01). The proposed algorithm is summarized as follows: 


Algorithm 1 The proposed supervised collective classification 
algorithm 

Input: training samples training labels {yj}^l v 

regularization parameter A 

Output: optimal weighting vector w* 

Initialization: = -^-1, = 0, t = 0 

while and do not converge do 
Compute g = \ a fvj x i- 
if £ (w^ — s w Proj g (g),a W) < £ then. 

u;( t+1 ) = — s w Proj g (g ) 

else 

u;( t+1 ) = + s w Proj g (g ) 

end if 

a (t+i) _ p ro j a -f «s a vec ^1 — yjW^ t+1 ^ T Xj^j . 
t = t + 1 

end while 

For robust algorithm, set re* = w*t{ w * >0 y 


Since (IT6t imposes no positivity constraint on the elements 
of the weighting vector w, some entries of w can be negative, 
which implies that one should not trust the labels generated 
by labelers with negative weights for collective classification. 
However, in practice altering labeler’s labels might be too 
aggressive and resulting in non-robust classification for the 
test data. One way to alleviate this situation is to truncate the 
weights by setting Wi = W{ if W{ > 0 and Wi = 0 if Wi < 0. 
That is, the labels from reliable labelers are preserved, whereas 
the labels from unreliable labelers are discarded. We refer to 
this approach as the proposed robust method. Note that the 
proposed saddle algorithm can be adjusted to different convex 
surrogate loss functions T and convex regularization function 
R following the same methodology. 

V. Performance Evaluation 

We compare the crowdsourcing algorithms introduced in 
Sec. HD with the proposed method in Sec. GS on a canonical 
crowdsourced dataset and the collected multiple-choice exam 
datasets. The canonical dataset is the text relevance judgment 
dataset provided in Text REtrieval Conference (TREC) Crowd¬ 
sourcing Track in 2011 E2, where labelers are asked to judge 


the relevance of paragraphs excerpted from a subset of articles 
with given topics. Each labeler then generates a binary label 
that is either “relevant” or “irrelevant”. This dataset is a sparse 
dataset in the sense that in average each labeler only labels 
roughly 26 articles out of 394 articles in total. The exam 
datasets contains science exam with 40 questions and math 
exam with 30 questions. There are 4 choices for each question 
and therefore this is a typical multi-class machine learning 
task. These datasets are quite dense in the sense that almost 
every student generates an answer for each question. 

The oracle classifier to be compared is the performance of 
the best labeler in the crowds. All tuning parameters are deter¬ 
mined by leave-one-out-cross-validation (LOOCV) approach 
swiping from 0 to 200 for the training data. The classification 
accuracy are listed in Table [VH where the parentheses in the 
row of best labeler means the number of correctly specified 
items of the best labeler, and the classifier of the highest 
classification accuracy is marked by bolded face. 

For the TREC2011 dataset, when 10 percent of items (40 
items) are used to train the classifier, majority votes leads to 
around 0.8 classification rate. The classification accuracy has 
notable improvement by using weighted averaging, conven¬ 
tional SVM, and supervised EM algorithm. Naive Bayes sam¬ 
pler has worse performance due to limited training samples. 
Note that the proposed robust algorithm outperforms others 
by assigning more weights to reliable labelers and discarding 
labels from unreliable labeler. 

The science dataset is perhaps the most challenging one 
since there are no perfect experts (i.e., labelers with clas¬ 
sification accuracy 1) in the crowds and most of students 
do not provide accurate answers. Despite its difficulties, our 
proposed method still outperforms others. Note that in this 
case the proposed non-robust and robust methods have the 
same classification accuracy since this dataset is non-sparse 
and the accuracy of answers in the training data and test data 
are highly consistent. 

The math dataset is a relatively easy task since the majority 
of students have correct answers. Consequently unsupervised 
methods tend to have the same performance as the oracle 
classifier. In case of limited training data size (5 training 
samples), some supervised methods such as naive Bayes 
sampler and multi-class SVM suffer performance degradation 
due to insufficient training samples, whereas the proposed 
method attains perfect classification. 

VI. Conclusion 

This paper provides an overview of unsupervised and su¬ 
pervised algorithms for crowdsourcing and proposes a super¬ 
vised collective classification method where the weights of 
each labeler is determined via a saddle point algorithm. The 
proposed method is capable of distinguishing reliable labelers 
from unreliable ones to enhance the classification accuracy 
with limited training samples. The results on a benchmark 
crowdsourced dataset and the exam datasets collected by the 
authors show that the proposed method outperforms other 
algorithms. This suggests that supervised collective classifi¬ 
cation methods with limited training samples can be crucial 
for crowdsourcing and relevant applications. 





TABLE I 

Descriptions of crowdsourced datasets and classification accuracy. 


Description / Dataset 

TREC2011 

TREC2011 

Science 

Math 

Math 

training data size (N s ) 

40 

60 

10 

5 

10 

test data size (N — N s ) 

354 

334 

30 

25 

20 

number of labelers ( L ) 

689 

689 

183 

559 

559 

Method / Classification accuracy 

TREC2011 

TREC2011 

Science 

Math 

Math 

best labeler (oracle) 

1 (82) 

1 (84) 

0.7 (30) 

1 (25) 

1 (20) 

majority vote 

0.7938 

0.7964 

0.4667 

1 

1 

weighted averaging 

0.8305 

0.8323 

0.4667 

1 

1 

exponential weighted algorithm 

0.8051 

0.8084 

0.2667 

0.36 

0.4 

conventional SVM 

0.8333 

0.8413 

0.5 

0.96 

1 

multi-class SVM 

X 

X 

0.4333 

0.52 

0.7 

unsupervised EM 

0.7881 

0.7784 

0.5 

1 

1 

supervised EM 

0.8277 

0.8174 

0.5 

1 

1 

naive Bayes sampler 

0.6921 

0.6707 

0.5333 

0.64 

1 

multi-class Adaboost 

0.8051 

0.7994 

0.5167 

0.8489 

0.885 

The proposed method 

0.8277 

0.8323 

0.5333 

1 

1 

The proposed robust method 

0.8446 

0.8413 

0.5333 

1 

1 


Acknowledgement 

The first author would like to thank Tianpei Xie and Dejiao 
Zhang at the University of Michigan for useful discussions. 


References 

[1] J. Surowiecki, The Wisdom of Crowds. Anchor, 2005. 

[2] R. J. Prill, J. Saez-Rodriguez, L. G. Alexopoulos, P. K. Sorger, and 
G. Stolovitzky, “Crowdsourcing network inference: The dream predic¬ 
tive signaling network challenge,” Science Signaling , vol. 4, no. 189, 
Sept. 2011. 

[3] D. R. Choffnes, F. E. Bustamante, and Z. Ge, “Crowdsourcing service- 
level network event monitoring,” SIGCOMM Comput. Commun. Rev., 
vol. 40, no. 4, pp. 387-398, Aug. 2010. 

[4] C. Robson, “Using mobile technology and social networking to crowd- 
source citizen science,” Ph.D. dissertation, Berkeley, CA, USA, 2012. 

[5] J. Albors, J. Ramos, and J. Hervas, “New learning network paradigms: 
Communities of objectives, crowdsourcing, wikis and open source,” 
International Journal of Information Management, vol. 28, no. 3, pp. 
194-202, 2008. 

[6] M. Nitti, R. Girau, and L. Atzori, “Trustworthiness management in the 
social internet of things,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 5, 
pp. 1253-1266, May 2014. 

[7] I.-R. Chen, F. Bao, and J. Guo, “Trust-based service management for 
social internet of things systems,” IEEE Trans. Dependable Secure 
Comput., vol. PP, no. 99, pp. 1-1, 2015. 

[8] M. Nitti, L. Atzori, and I. Cvijikj, “Friendship selection in the social 
internet of things: Challenges and possible strategies,” IEEE Internet 
Things J., vol. 2, no. 3, pp. 240-247, June 2015. 

[9] J. Howe, “The rise of crowdsourcing,” Wired Magazine, vol. 14, no. 6, 
2006. 

[10] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood 
from incomplete data via the EM algorithm,” Journal of the Royal 
Statistical Society: Series B, vol. 39, pp. 1-38, 1977. 

[11] A. P. Dawid and A. M. Skene, “Maximum likelihood estimation of 
observer error-rates using the EM algorithm,” Journal of the Royal 
Statistical Society. Series C (Applied Statistics), vol. 28, no. 1, pp. 20- 
28, 1979. 

[12] V. C. Raykar, S. Yu, L. H. Zhao, G. H. Valadez, C. Florin, L. Bogoni, 
and L. Moy, “Learning from crowds,” J. Mach. Team. Res., vol. 11, pp. 
1297-1322, Aug. 2010. 

[13] D. Zhou, J. Platt, S. Basu, and Y. Mao, “Learning from the wisdom 
of crowds by minimax entropy,” in Advances in Neural Information 
Processing Systems (NIPS), 2012, pp. 2204-2212. 


[14] O. Dekel and O. Shamir, “Vox populi: Collecting high-quality labels 
from a crowd,” in Proceedings of the 22nd Annual Conference on 
Learning Theory, 2009. 

[15] S. Ertekin, H. Hirsh, and C. Rudin, “Approximating the wisdom of the 
crowd,” in Advances in Neural Information Processing Systems (NIPS). 
Workshop on Computational Social Science and the Wisdom of Crowds, 
2011 . 

[16] P. G. Ipeirotis, F. Provost, and J. Wang, “Quality management on amazon 
mechanical turk,” in Proceedings of the ACM SIGKDD Workshop on 
Human Computation, 2010, pp. 64-67. 

[17] W. Tang and M. Lease, “Semi-Supervised Consensus Labeling for 
Crowdsourcing,” in Proceedings of the ACM SIGIR Workshop on Crowd¬ 
sourcing for Information Retrieval (CIR), July 2011, pp. 36-41. 

[18] P. Ipeirotis, F. Provost, V. Sheng, and J. Wang, “Repeated labeling 
using multiple noisy labelers,” Data Mining and Knowledge Discovery, 
vol. 28, no. 2, pp. 402-441, 2014. 

[19] C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., 
vol. 20, no. 3, pp. 273-297, Sept. 1995. 

[20] K. Crammer and Y. Singer, “On the algorithmic implementation of 
multiclass kernel-based vector machines,” J. Mach. Learn. Res., vol. 2, 
pp. 265-292, Mar. 2002. 

[21] R. Snow, B. O’Connor, D. Jurafsky, and A. Y. Ng, “Cheap and fast-but is 
it good?: Evaluating non-expert annotations for natural language tasks,” 
in Proceedings of the Conference on Empirical Methods in Natural 
Language Processing (EMNLP), 2008, pp. 254-263. 

[22] J. Zhu, S. Rosset, H. Zou, and T. Hastie, “Multi-class AdaBoost,” Tech. 
Rep., 2006. 

[23] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical 
Learning, ser. Springer Series in Statistics, 2001. 

[24] N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,” 
Inf. Comput., vol. 108, no. 2, pp. 212-261, Feb. 1994. 

[25] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed 
optimization and statistical learning via the alternating direction method 
of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1-122, 
Jan. 2011. 

[26] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge 
University Press, 2004. 

[27] Text REtrieval Conference (TREC) Crowdsourcing Track. National 
Institute of Standards and Technology (NIST), 2011. [Online]. 
Available: https ://sites.google.com/site/treccrowd/home 
















